AITopics | super arm

Collaborating Authors

super arm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Combinatorial Multi-Armed Bandit with General Reward Functions

Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu

Neural Information Processing SystemsApr-21-2026, 15:27:42 GMT

In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the max() function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve O(log T) distribution-dependent regret and O( T) distribution-independent regret, where T is the time horizon. We apply our results to the K-MAX problem and expected utility maximization problems. In particular, for K-MAX, we provide the first polynomial-time approximation scheme (PTAS) for its offline problem, and give the first O( T) bound on the (1)-approximation regret of its online problem, for any > 0.

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.86)
Information Technology > Game Theory (0.71)

Add feedback

Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits

Özyıldırım, Emre, Yaycı, Barış, Akturk, Umut Eren, Tekin, Cem

arXiv.org Machine LearningApr-17-2026

We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $τ_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $τ_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $τ_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $τ_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $τ_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.

artificial intelligence, assignment, machine learning, (19 more...)

arXiv.org Machine Learning

2604.14908

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)

Genre: Research Report (0.64)

Industry: Telecommunications (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.88)

Add feedback

c92a10324374fac681719d63979d00fe-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 03:48:43 GMT

base arm, minw, super arm, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Air (0.67)
Consumer Products & Services > Travel (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Networks (0.68)

Add feedback

Combinatorial Pure Exploration with Bottleneck Reward Function

Neural Information Processing SystemsFeb-11-2026, 03:48:38 GMT

In this paper, we study the Combinatorial Pure Exploration problem with the Bottleneck reward function (CPE-B) under the fixed-confidence (FC) and fixed-budget (FB) settings.

artificial intelligence, base arm, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.46)

Industry:

Transportation > Air (0.68)
Consumer Products & Services > Travel (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference

Wang, Zichen, Hong, Haoyang, Li, Chuanhao, Li, Haoxuan, Zhang, Zhiheng, Wang, Huazheng

arXiv.org Artificial IntelligenceOct-14-2025

In multi-armed bandits with network interference (MABNI), the action taken by one node can influence the rewards of others, creating complex interdependence. While existing research on MABNI largely concentrates on minimizing regret, it often overlooks the crucial concern that an excessive emphasis on the optimal arm can undermine the inference accuracy for sub-optimal arms. Although initial efforts have been made to address this trade-off in single-unit scenarios, these challenges have become more pronounced in the context of MABNI. In this paper, we establish, for the first time, a theoretical Pareto frontier characterizing the trade-off between regret minimization and inference accuracy in adversarial (design-based) MABNI. We further introduce an anytime-valid asymptotic confidence sequence along with a corresponding algorithm, $\texttt{EXP3-N-CS}$, specifically designed to balance the trade-off between regret minimization and inference accuracy in this setting.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.07646

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
Information Technology > Data Science > Data Mining > Big Data (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

Neural Information Processing SystemsOct-2-2025, 17:06:37 GMT

We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB).

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.50)

Add feedback

Combinatorial Pure Exploration with Bottleneck Reward Function

Neural Information Processing SystemsAug-17-2025, 08:47:26 GMT

In this paper, we study the Combinatorial Pure Exploration problem with the Bottleneck reward function (CPE-B) under the fixed-confidence (FC) and fixed-budget (FB) settings.

artificial intelligence, machine learning, minw, (17 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Air (0.67)
Consumer Products & Services > Travel (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Networks (0.68)

Add feedback

c92a10324374fac681719d63979d00fe-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 08:47:23 GMT

artificial intelligence, base arm, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.46)

Industry:

Transportation > Air (0.68)
Consumer Products & Services > Travel (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Representative Action Selection for Large Action-Space Meta-Bandits

Zhou, Quan, Kozdoba, Mark, Mannor, Shie

arXiv.org Machine LearningMay-27-2025

We study the problem of selecting a subset from a large action space shared by a family of bandits, with the goal of achieving performance nearly matching that of using the full action space. We assume that similar actions tend to have related payoffs, modeled by a Gaussian process. To exploit this structure, we propose a simple epsilon-net algorithm to select a representative subset. We provide theoretical guarantees for its performance and compare it empirically to Thompson Sampling and Upper Confidence Bound.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2505.18269

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit

Wu, Xiaoyi, Ji, Bo, Li, Bin

arXiv.org Artificial IntelligenceJan-10-2025

Combinatorial Multi-Armed Bandit with fairness constraints is a framework where multiple arms form a super arm and can be pulled in each round under uncertainty to maximize cumulative rewards while ensuring the minimum average reward required by each arm. The existing pessimistic-optimistic algorithm linearly combines virtual queue-lengths (tracking the fairness violations) and Upper Confidence Bound estimates as a weight for each arm and selects a super arm with the maximum total weight. The number of super arms could be exponential to the number of arms in many scenarios. In wireless networks, interference constraints can cause the number of super arms to grow exponentially with the number of arms. Evaluating all the feasible super arms to find the one with the maximum total weight can incur extremely high computational complexity in the pessimistic-optimistic algorithm. To avoid this, we develop a low-complexity fair learning algorithm based on the so-called pick-and-compare approach that involves randomly picking $M$ feasible super arms to evaluate. By setting $M$ to a constant, the number of comparison steps in the pessimistic-optimistic algorithm can be reduced to a constant, thereby significantly reducing the computational complexity. Our theoretical proof shows this low-complexity design incurs only a slight sacrifice in fairness and regret performance. Finally, we validate the theoretical result by extensive simulations.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.00924

Country:

North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Communications > Networks (0.88)

Add feedback